A well-behaved statistic is a term sometimes used in the theory of statistics to describe part of a procedure. This usage is broadly similar to the use of well-behaved in more general mathematics. It is essentially an assumption about the formulation of an estimation procedure (which entails the specification of an estimator or statistic) that is used to avoid giving extensive details about what conditions need to hold. In particular it means that the statistic is not an unusual one in the context being studied. Due to this, the meaning attributed to well-behaved statistic may vary from context to context.
The present article is mainly concerned with the context of data mining procedures applied to statistical inference and, in particular, to the group of computationally intensive procedure that have been called algorithmic inference.
Contents |
In algorithmic inference, the property of a statistic that is of most relevance is the pivoting step which allows to transference of probability-considerations from the sample distribution to the distribution of the parameters representing the population distribution in such a way that the conclusion of this statistical inference step is compatible with the sample actually observed.
By default, capital letters (such as U, X) will denote random variables and small letters (u, x) their corresponding realizations and with gothic letters (such as ) the domain where the variable takes specifications. Facing a sample , given a sampling mechanism , with scalar, for the random variable X, we have
The sampling mechanism , of the statistic s, as a function ? of with specifications in , has an explaining function defined by the master equation:
for suitable seeds and parameter ?
In order to derive the distribution law of the parameter T, compatible with , the statistic must obey some technical properties. Namely, a statistic s is said to be well-behaved if it satisfies the following three statements:
For instance, for both the Bernoulli distribution with parameter p and the exponential distribution with parameter ? the statistic is well-behaved. The satisfaction of the above three properties is straightforward when looking at both explaining functions: if , 0 otherwise in the case of the Bernoulli random variable, and for the Exponential random variable, giving rise to statistics
and
Vice versa, in the case of X following a continuous uniform distribution on the same statistics do not meet the second requirement. For instance, the observed sample gives . But the explaining function of this X is . Hence a master equation would produce with a U sample and a solution . This conflicts with the observed sample since the first observed value should result greater than the right extreme of the X range. The statistic is well-behaved in this case.
Analogously, for a random variable X following the Pareto distribution with parameters K and A (see Pareto example for more detail of this case),
and
can be used as joint statistics for these parameters.
As a general statement that holds under weak conditions, sufficient statistics are well-behaved with respect to the related parameters. The table below gives sufficient / Well-behaved statistics for the parameters of some of the most commonly used probability distributions.
Distribution | Definition of density function | Sufficient/Well-behaved statistic |
---|---|---|
Uniform discrete | ||
Bernoulli | ||
Binomial | ||
Geometric | ||
Poisson | ||
Uniform continuous | ||
Negative exponential | ||
Pareto | ||
Gaussian | ||
Gamma |